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Abstract. We use a method of Goulden and Jackson to bound freq]^(ft'), the 
hmiting frequency of 1 in the Kolakoski word K. We prove that |freqi(ii') — 
1/2| < 17/762, assuming the limit exists, and estabhsh the semi-rigorous 
bound \lrcc\^(K) - 1/2| < 1/46. 



1. Introduction 

The Kolakoski word is an infinite sequence of I's and 2's that is equal to its own 
run length sequence: 

A" =2211 2 1 22 1 2211 2 1122 1 2 ••■ 
K = 2 211212 212 211--- 

Up to the choice of the first term, K is defined uniquely by this property. Beginning 
with 1 instead of 2 produces the word IK, which was introduced by Kolakoski [5l[6j. 
Let On be the number of I's occurring in the first n terms of K, and let 

b:cc{i{K) := lim — . 

n — 'oc fi 

It was conjectured by Dekking [2^ that this limit exists and equals 1/2. Kimbcrling's 
web page |3], where this conjecture is listed among several others, is responsible 
for its popularity. In this paper we use the Goulden-Jackson cluster method to 
give bounds on freq]^(Ar) consistent with the conjecture. In particular, we prove 
the following. 

Theorem. // the limiting frequency freq]^(_R') of 1 in the Kolakoski word exists, 
then 



freqi(/0-i 



< — « 0.0223097. 
- 762 



This method was explored in a different setting by Chvatal [1|, who produced 
this bound and several better bounds, reducing the difference from 1/2 to 



freqi(if)-i 



35 

< « 0.0008382. 

- 41754 



We thank Jean-Paul AUouche for pointing out Chvatal's paper to us. 
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2. The Goulden-Jackson cluster method 



2.1. Description. The Goulden-Jackson cluster method [3] is an efficient way of 
counting the number of words w on a given alphabet such that no subword of w 
appears in a given set S. We say that w avoids S. 

Here we use an extension of the method by Noonan and Zeilberger that tracks 
the frequency of the letters in a word. Define the weight of a word w to be 



where \w\a is the number of occurrences of a in w and \w\ is the length of w. Let 
W be the set of words on {1, 2} that avoid S. Let the weight of W be 



where Pn{xi,X2) is a polynomial in xi and X2 that carries information about 
the set of length-n words avoiding S. The Goulden-Jackson algorithm computes 
weight(W^) as a rational expression in xi, X2, and t. We refer the reader to the 
papers cited above for details of the algorithm. 



2.2. Avoided subwords. To use the Goulden-Jackson method we must find words 
that never appear as subwords of K. We accomplish this by capitalizing on the fact 
that if ui is a subword of K then the run length sequence of w is also a subword of 
K. This means that if w is not a subword of K then any word whose run length 
sequence contains w is also not a subword of K. We start by observing that the 
word 3 does not occur in K because 7^ is a word on {1,2}. Therefore, no word 
with 3 in its run length sequence can be a subword of K either; in particular, 111 
and 222 cannot be subwords of K. 

Now that we know that K avoids 111 and 222, we know that no word with 111 
or 222 in its run length sequence can occur in K. Namely, K avoids 12121 and 
21212 (since their run length sequences contain 111), and K also avoids 112211 and 
221122 (since their run length sequences are 222). 

There is a subtlety here, which is that 111 is the run length sequence of the 
words 212 and 121, yet these words both do appear in K. However, they only 
occur as part of the larger words 22122 and 11211, and these word have run length 
sequences 212, not 111. We pad 212 with I's on both ends to ensure that the run 
length sequence contains 111, and similarly we pad 121 with 2's. This padding is 
necessary whenever the run length sequence begins or ends with 1 . 

We iterate this process to obtain additional words that K avoids, producing the 
tree in Figure[T] Define Sd be the set of words in the tree in levels 1 through d (i.e., 
not including the root, 3). There are 2^^+^ — 2 words in Sd- 

This approach to producing words avoided by K is symmetric with respect to 
interchanging 1 and 2, so if we use all words in Sd it follows that Pnixi,X2) is 
symmetric in xi and X2- Because of this symmetry, all our bounds have the form 
\iTeqi{K) — 1/2| < e. Experiments with asymmetric word sets have not improved 
upon the bounds obtained with symmetric sets, so we do not pursue them here. 



weight (w) 



= X 




oo 
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Figure 1 . The first four generations in an infinite tree of words 
that the Kolakoski word avoids. 

3. Results 

We have three different (but closely related) ways of using the Goulden-Jackson 
method to produce bounds on becii{K). When we can compute the full generat- 
ing function weight(W), the denominator gives a bound directly. For large sets S 
computing the generating function as a rational expression is not computationally 
feasible; in this case we resort to computing the first few terms of the series expan- 
sion. Each term of the series provides a bound on frcqj^ (i^T) , although in general 
these bounds are not as good as the ones we get from the denominator. Finally, by 
examining many terms of the series we can often experimentally determine a closed 
form for the bounds being produced, which, after taking a limit, gives an improved 
bound that is semi-rigorous. 

3.1. Bounds from the denominator. From the term .T2)t" wc can deter- 

mine the minimum number of I's that occur in an n-letter word avoiding S; this is 
the minimum degree in xi of this polynomial. Let 

k 

minratio >^ &a;?*a;9'~'''t"' = min — 
^ l<i<k m 

for Ci G Z and > 1. 

If weight(H'') = A'^/(l — D) for some polynomials N and D, then weight(W) = 

Er=o^^".and 

minratio ND" — > minratio D 
as n — > cxD. Thus the denominator of weight(W^) dictates the asymptotic behavior 
of minratio ^^(xi, 2:2)^"'. 

For example, using the set Si = {111, 222} produces the generating function 

^^^^ ^ ' 1 - xlxlf^ - xlx2t^ - XiX^t^ - XiX2t'^ ' 

Hero minratio £> = 1/3, and the maximum ratio is 2/3. Therefore if the limit exists 
we have |freqi(i4:) - 1/2| < 1/6. 

The minratio for ^2 is also 1/3 despite additional words. However, using S3 
produces the denominator 

1 J. ^18„18 .36 _ 16 17 .33 _ 17 16 .33 _ 15 15 .30 , o 12 12 .24 

~\~ X-^ t ~\~ X-^ X^ t ~\~ X-^X^ t ~{~ X-yXt^t ~\~ X-y »^2^ X-yX^t 

_ ^8 7^15 _ 9 6 6y.l2 _ ™5 5^10 _ 0^4 5^9 _ 0^5 4.9 _ „4 4.8 
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with minratio Z) = 4/9, giving \h-cq^{K) - 1/2| < 1/18. 

3.2. Bounds from series terms. Computing weight(Vl^) as a rational expression 
requires solving a system of linear equations, and this system is large when there 
are many words in S. Therefore, to compute improved bounds we use a modified 
algorithm, available in the function wGJseries in Zeilberger's package DAVID_IAN 
[8], that computes only the first TV terms of the series. The following proposition 
says that each term puts a bound on ireqi{K). The idea is that bounding the 
number of I's in every length-n subword of K produces a bound that extends to 
all of K. Recall that Om is the number of I's in the first m terms of K. 

Lemma. If a < \w\i < b for every length-n subwordw of K , thena/n < beqi(K) < 
b/n if the limit exists. 

Proof. For an arbitrary m, we have m = qn + r for some q £ Z and < r < n. 
Partition the first m terms of K into q consecutive blocks of length n, leaving a 
remainder block of length r. The number of I's in each block of length n is at most 
b, so Orn < qb + r. Similarly, o„i > qa. Therefore for all m we have 

^ < ^ < + 
m ~ m ~ m 

Substituting q — ^^-^ and letting m ^ oo we get that 

a , Om , b 

— < iim — < — 

n m^Qo m n 

if the limit in question exists. □ 

For example, we compute weight(VF) with Si — {111,222} out to term N = 5 
to be 



1 + (xi + X2) t+ {xl + 2xiX2 + xl) t^ + {ixlx2 + ixixl) t^ 

+ {2x\x2 + Qxlxl + 2xixl) t'^ + {x\x2 + lx\xl + lx\xl + xixl) t^. 

From the coefficient of t^ , we conclude that 1 < |w|i < 2 for every word of length 
3 avoiding 111 and 222. This information gives the bound |freq]^(iir) — 1/2| < 1/6, 
which in this case is the same bound obtained from the denominator. While we 
could have used any coefficient in the series expansion to get a bound, the bounds 
we get from the coefficients of t^ and t^ are actually worse. 

Performing similar computations on 5*^ for larger d produces better bounds. The 
following table gives the best bound e{n) = 1/2 — minratiop„(xi, 2:2)^" achieved 
among the first N terms. Computing N = 800 terms for rf = 5 took a day and a 
half. 



d 


\Sd\ 


N 


n 


e{n) 


1 


2 


200 


3 


1/6 


2 


6 


200 


3 


1/6 


3 


14 


200 


9 


1/18 


4 


30 


500 


498 


17/498 


5 


62 


800 


762 


17/762 


6 


126 


600 


555 


5/222 



The best bound here is \ireqi{K) - 1/2| < 17/762, provided by d = 5. 
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For 1 < d < 3 the bounds achieved are best possible for these word sets; indeed 
they are the same bounds obtained from minratioD for weight (IF). For c? > 4, 
computing more terms will produce increasingly better bounds, although for a 
fixed d the bounds approach 1/2 — minratiof, as discussed in the following section. 
Likewise, using more words should produce better bounds, although this increases 
the computation time. 

3.3. Implied bounds. In fact the sequence of minimum degrees of a; i mpn{xi,X2) 
has the simple structure of a linear quasi-polynomial for sufficiently large n. More- 
over, the successive maxima of the sequence minratiop„(a;i, a;2)t" eventually occur 
in just one of the residue classes. 

Having computed several terms for = 1 it is not difficult to guess that forn > 1 
the minimum degree of Xi in Pn{xi,X2) is given by the linear quasi-polynomial 

f if n = mod 3, 
^^3^ if n = 1 mod 3, 
^ iin = 2 mod 3. 

Therefore minratiop„(a;i,a;2)t" — > 1/3, and in fact the limit is attained every three 
terms beginning at n = 3. The sequence of minimum degrees for = 2 is identical 
to that for d = 1. 

For d = 3, nminratiop„(a;i, a;2)f"' is given by 



'4 


n 
9 




if n 




mod 9, 


4 


n-1 
9 




if n 




1 mod 9, 


4- 


n-2 
9 




if n 




2 mod 9, 


4 


n-3 
9 


+ 1 


if n 




3 mod 9, 


< 4 


n—4 
9 


+ 1 


if n 




4 mod 9, 


4 


n— 5 
9 


+ 1 


if n 




5 mod 9, 


4 


n — 6 
9 


+ 2 


if n 




6 mod 9, 


4- 


n-7 
9 


+ 2 


if n 




7 mod 9, 


.4- 


n-8 
9 


+ 3 


if n 




8 mod 9. 



The limit, 4/9, is first attained at n = 9, producing e = 1/18. 

For higher values of d, the limit is not attained by any term. The sequence of 
minimum ratios for d = 4 and d = 5 are eventually linear quasi-polynomials. Using 
d = 6 iterations of words, one finds that at least the first 600 terms in the series 
have the same minratio as those for = 5, with the exception of n = 62; therefore 
the same eventual quasi-polynomial seems to hold. 

For = 4 the quasi-polynomial has modulus 15. For residue class n = i mod 15 
the main term is 7 • and the constant terms for i = 0, 1, . . . , 14 are 

-1,-1,0,1,1,1,2,2,2,3,4,4,5,5,5. 

The successive maxima are (7m + 1)/(15to + 3) for m > 2 and occur at terms 
n = 15m -|- 3. Thus the limit is 7/15, producing e = 1/30. 



6 



E. Kupin and E. Rowland 



For d — 5 the modulus is 69. The main term is 33 • for n = i mod 69, and 
the constant terms are 

- 1,-1,0,1,1,1,2,2,2,3,4,4,5,5,5,6,6,7,8,8,8,9,9,9,10,11,11, 

12, 12, 13, 13, 14, 14, 15, 15, 15, 16, 17, 17, 18, 18, 18, 19, 19, 20, 21, 21, 21, 

22, 22, 22, 23, 24, 24, 25, 25, 25, 26, 26, 27, 28, 28, 28, 29, 29, 30, 31, 31, 31. 

The successive maxima are (33to + l)/(69m + 3) for m > 3 and occur at terms 
n = 69m + 3; for example, m = 11 produces the best rigorous bound 

_ 1 33 • 11 + 1 _ 17 
^ ~ 2 ~ 69-11 + 3 ~ 762' 
Therefore most probably minratioZ? = 33/69 for weight(T4^) in this case, and 

freqi(X)-i 
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